Search CORE

A simple, practical and complete O-time Algorithm for RNA folding using the Four-Russians Speedup

Author: Dan Gusfield
IL Hofacker
J Kleinberg
M Zuker
M Zuker
MS Waterman
P Clote
R Backofen
R Durbin
R Nussinov
R Nussinov
SE Seemann
SL Graham
T Akutsu
TM Chan
Y Wexler
Yelena Frid
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The problem of computationally predicting the secondary structure (or folding) of RNA molecules was first introduced more than thirty years ago and yet continues to be an area of active research and development. The basic <it>RNA-folding problem </it>of finding a maximum cardinality, non-crossing, matching of complimentary nucleotides in an RNA sequence of length <it>n</it>, has an <it>O</it>(<it>n</it>3)-time dynamic programming solution that is widely applied. It is known that an <it>o</it>(<it>n</it>3) worst-case time solution is possible, but the published and suggested methods are complex and have not been established to be practical. Significant practical improvements to the original dynamic programming method have been introduced, but they retain the <it>O</it>(<it>n</it>3) worst-case time bound when <it>n </it>is the only problem-parameter used in the bound. Surprisingly, the most widely-used, general technique to achieve a worst-case (and often practical) speed up of dynamic programming, the <it>Four-Russians </it>technique, has not been previously applied to the RNA-folding problem. This is perhaps due to technical issues in adapting the technique to RNA-folding. Results In this paper, we give a simple, complete, and practical Four-Russians algorithm for the basic RNA-folding problem, achieving a worst-case time-bound of <it>O</it>(<it>n</it>3/log(<it>n</it>)). Conclusions We show that this time-bound can also be obtained for richer nucleotide matching scoring-schemes, and that the method achieves consistent speed-ups in practice. The contribution is both theoretical and practical, since the basic RNA-folding problem is often solved multiple times in the inner-loop of more complex algorithms, and for long RNA molecules in the study of RNA virus genomes.</p

Directory of Open Access Journals

University of Salford Institutional Repository

BSGatlas : a unified Bacillus subtilis genome and transcriptome annotation atlas with enhanced information access

Author: Alkan F
Anthon C
Breüner A
Geissler AS
Gonzalez Tortuero E
Gorodkin J
Kallehauge TB
Poulsen LD
Seemann SE
Vinther J
Publication venue: 'Microbiology Society'
Publication date: 01/01/2021
Field of study

A large part of our current understanding of gene regulation in Gram-positive bacteria is based on Bacillus subtilis , as it is one of the most well studied bacterial model systems. The rapid growth in data concerning its molecular and genomic biology is distributed across multiple annotation resources. Consequently, the interpretation of data from further B. subtilis experiments becomes increasingly challenging in both low- and large-scale analyses. Additionally, B. subtilis annotation of structured RNA and non-coding RNA (ncRNA), as well as the operon structure, is still lagging behind the annotation of the coding sequences. To address these challenges, we created the B. subtilis genome atlas, BSGatlas, which integrates and unifies multiple existing annotation resources. Compared to any of the individual resources, the BSGatlas contains twice as many ncRNAs, while improving the positional annotation for 70 % of the ncRNAs. Furthermore, we combined known transcription start and termination sites with lists of known co-transcribed gene sets to create a comprehensive transcript map. The combination with transcription start/termination site annotations resulted in 717 new sets of co-transcribed genes and 5335 untranslated regions (UTRs). In comparison to existing resources, the number of 5′ and 3′ UTRs increased nearly fivefold, and the number of internal UTRs doubled. The transcript map is organized in 2266 operons, which provides transcriptional annotation for 92 % of all genes in the genome compared to the at most 82 % by previous resources. We predicted an off-target-aware genome-wide library of CRISPR–Cas9 guide RNAs, which we also linked to polycistronic operons. We provide the BSGatlas in multiple forms: as a website (https://rth.dk/resources/bsgatlas/), an annotation hub for display in the UCSC genome browser, supplementary tables and standardized GFF3 format, which can be used in large scale -omics studies. By complementing existing resources, the BSGatlas supports analyses of the B. subtilis genome and its molecular biology with respect to not only non-coding genes but also genome-wide transcriptional relationships of all genes

Copenhagen University Research Information System

Effects of using coding potential, sequence conservation and mRNA structure conservation for predicting pyrroly-sine containing genes

Author: B Chaudhuri
B Knudsen
C Notredame
Christian Theil Have
DG Longstaff
E Torarinsson
E Torarinsson
EP Nawrocki
GV Kryukov
Henning Christiansen
I Hofacker
IL Hofacker
IL Hofacker
IU Heinemann
J Atkins
J Reeder
JA Krzycki
JD Thompson
JD Thompson
K Katoh
M Bauer
M Fujita
M Höchsmann
MA Gaston
MA Gaston
N Wirth
S Bernhart
S Lindgreen
S Mørk
S Will
SE Seemann
SF Altschul
Sine Zambach
T Abe
TM Martin Simonsen
X Xu
Y Zhang
Z Yao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

BACKGROUND: Pyrrolysine (the 22nd amino acid) is in certain organisms and under certain circumstances encoded by the amber stop codon, UAG. The circumstances driving pyrrolysine translation are not well understood. The involvement of a predicted mRNA structure in the region downstream UAG has been suggested, but the structure does not seem to be present in all pyrrolysine incorporating genes. RESULTS: We propose a strategy to predict pyrrolysine encoding genes in genomes of archaea and bacteria. We cluster open reading frames interrupted by the amber codon based on sequence similarity. We rank these clusters according to several features that may influence pyrrolysine translation. The ranking effects of different features are assessed and we propose a weighted combination of these features which best explains the currently known pyrrolysine incorporating genes. We devote special attention to the effect of structural conservation and provide further substantiation to support that structural conservation may be influential – but is not a necessary factor. Finally, from the weighted ranking, we identify a number of potentially pyrrolysine incorporating genes. CONCLUSIONS: We propose a method for prediction of pyrrolysine incorporating genes in genomes of bacteria and archaea leading to insights about the factors driving pyrrolysine translation and identification of new gene candidates. The method predicts known conserved genes with high recall and predicts several other promising candidates for experimental verification. The method is implemented as a computational pipeline which is available on request

Roskilde Universitet

RNAcentral 2021: secondary structure integration, improved sequence search and new member databases

Author: Barshir R
Bateman A
Bouchard-Bourelle P
Bruford E
Cannone JJ
Chan PP
dos Santos G
Finn RD
Fishilevich S
Frankish A
Fromm B
Gorodkin J
Griffiths-Jones S
Gutell RR
Hatzigeorgiou AG
Hoksza D
Kalvari I
Karagkouni D
Karlowski WM
Kay S
Kramarz B
Lovering RC
Lowe TM
Lui LM
Ma L
Mani P
Marygold S
Mestdagh P
Mudge JM
Nawrocki EP
Panni S
Peterson KJ
Petrov A
Petrov AS
Porras P
Ramachandran S
Ribas CE
Scott M
Seal R
Seemann SE
Sweeney BA
Szymanski M
Volders P-J
Weinberg Z
Weng S
Zhang Z
Publication venue: OXFORD UNIV PRESS
Publication date: 08/01/2021
Field of study

RNAcentral is a comprehensive database of non-coding RNA (ncRNA) sequences that provides a single access point to 44 RNA resources and >18 million ncRNA sequences from a wide range of organisms and RNA types. RNAcentral now also includes secondary (2D) structure information for >13 million sequences, making RNAcentral the world’s largest RNA 2D structure database. The 2D diagrams are displayed using R2DT, a new 2D structure visualization method that uses consistent, reproducible and recognizable layouts for related RNAs. The sequence similarity search has been updated with a faster interface featuring facets for filtering search results by RNA type, organism, source database or any keyword. This sequence search tool is available as a reusable web component, and has been integrated into several RNAcentral member databases, including Rfam, miRBase and snoDB. To allow for a more fine-grained assignment of RNA types and subtypes, all RNAcentral sequences have been annotated with Sequence Ontology terms. The RNAcentral database continues to grow and provide a central data resource for the RNA community. RNAcentral is freely available at https://rnacentral.org

UCL Discovery

RNAalifold: improved consensus structure prediction for RNA alignments

Abstract Background The prediction of a consensus structure for a set of related RNAs is an important first step for subsequent analyses. RNAalifold, which computes the minimum energy structure that is simultaneously formed by a set of aligned sequences, is one of the oldest and most widely used tools for this task. In recent years, several alternative approaches have been advocated, pointing to several shortcomings of the original RNAalifold approach. Results We show that the accuracy of RNAalifold predictions can be improved substantially by introducing a different, more rational handling of alignment gaps, and by replacing the rather simplistic model of covariance scoring with more sophisticated RIBOSUM-like scoring matrices. These improvements are achieved without compromising the computational efficiency of the algorithm. We show here that the new version of RNAalifold not only outperforms the old one, but also several other tools recently developed, on different datasets. Conclusion The new version of RNAalifold not only can replace the old one for almost any application but it is also competitive with other approaches including those based on SCFGs, maximum expected accuracy, or hierarchical nearest neighbor classifiers.</p

Directory of Open Access Journals

Fraunhofer-ePrints

SHOX2 DNA Methylation is a Biomarker for the diagnosis of lung cancer based on bronchial aspirates

Abstract Background This study aimed to show that SHOX2 DNA methylation is a tumor marker in patients with suspected lung cancer by using bronchial fluid aspirated during bronchoscopy. Such a biomarker would be clinically valuable, especially when, following the first bronchoscopy, a final diagnosis cannot be established by histology or cytology. A test with a low false positive rate can reduce the need for further invasive and costly procedures and ensure early treatment. Methods Marker discovery was carried out by differential methylation hybridization (DMH) and real-time PCR. The real-time PCR based HeavyMethyl technology was used for quantitative analysis of DNA methylation of SHOX2 using bronchial aspirates from two clinical centres in a case-control study. Fresh-frozen and Saccomanno-fixed samples were used to show the tumor marker performance in different sample types of clinical relevance. Results Valid measurements were obtained from a total of 523 patient samples (242 controls, 281 cases). DNA methylation of SHOX2 allowed to distinguish between malignant and benign lung disease, i.e. abscesses, infections, obstructive lung diseases, sarcoidosis, scleroderma, stenoses, at high specificity (68% sensitivity [95% CI 62-73%], 95% specificity [95% CI 91-97%]). Conclusions Hypermethylation of SHOX2 in bronchial aspirates appears to be a clinically useful tumor marker for identifying subjects with lung carcinoma, especially if histological and cytological findings after bronchoscopy are ambiguous.</p

Directory of Open Access Journals

Edge Hill University Research Information Repository

Quantifying variances in comparative RNA secondary structure prediction

Author: A Harmanci
A Novak
AS Schwartz
B Knudsen
B Knudsen
CB Do
D Chivian
D Chu
DGH Cedric Notredame
E Freyhult
G Doose
G Lunter
I Hofacker
I Miklos
Ingolfur Edvardsson
IV Walle
J Felsenstein
J Hein
James WJ Anderson
JL Thorne
Jotun Hein
JWJ Anderson
K Katoh
K Lari
LS Wang
M Andronescu
M Kertesz
MA Suchard
Michael Golden
NR Markham
O Gahura
P Gardner
PP Gardner
Preeti Arunapuram
R Chenna
R Dowell
R Lyngsoe
R Satija
RC Edgar
RK Bradley
S Bernhart
S Engelen
S Engelen
S Griffiths-Jones
S Washietl
SE Seemann
Z Sukosd
Z Sukosd
Zsuzsanna Sükösd
Ádám Novák
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Examples of sequence conservation analyses capture a subset of mouse long non-coding RNAs sharing homology with fish conserved genomic elements

Author: A Pauli
AJ Vilella
AN Khachane
AR Quinlan
B Bánfai
C Camacho
C Carrieri
C Trapnell
CJ Brown
D Licastro
DA Hosack
DR Kelley
DW Huang
DW Huang
Ferenc Müller
G Bejerano
GA Calin
H Jia
I Ulitsky
IA Qureshi
J Ponjavic
J Sheik Mohamed
J-W Nam
JL Rinn
JM Silva
JN Hutchinson
JP McCutcheon
KC Pang
KS Pollard
KS Pollard
L Duret
L Hui
L Kong
LA Pennacchio
M Aoyama
M Guttman
M Lin
ME Dinger
ME Dinger
MN Cabili
NR Zearfoss
NT Ingolia
P Carninci
P Flicek
P Flicek
P Flicek
PP Amaral
PP Amaral
R Arrial
RA Chodroff
Remo Sanges
S Haider
S Katayama
S Washietl
SE Seemann
SJ Hubbard
SR Eddy
Swaraj Basu
T Fawcett
T Gesell
T Kino
T Ota
T Sing
T-K Kim
TR Dreszer
TR Mercer
TR Mercer
UA Ørom
Y Okazaki
Y Sakuraba
Y Zhou
Z Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Background: Long non-coding RNAs (lncRNA) are a major class of non-coding RNAs. They are involved in diverse intra-cellular mechanisms like molecular scaffolding, splicing and DNA methylation. Through these mechanisms they are reported to play a role in cellular differentiation and development. They show an enriched expression in the brain where they are implicated in maintaining cellular identity, homeostasis, stress responses and plasticity. Low sequence conservation and lack of functional annotations make it difficult to identify homologs of mammalian lncRNAs in other vertebrates. A computational evaluation of the lncRNAs through systematic conservation analyses of both sequences as well as their genomic architecture is required.Results: Our results show that a subset of mouse candidate lncRNAs could be distinguished from random sequences based on their alignment with zebrafish phastCons elements. Using ROC analyses we were able to define a measure to select significantly conserved lncRNAs. Indeed, starting from ~2,800 mouse lncRNAs we could predict that between 4 and 11% present conserved sequence fragments in fish genomes. Gene ontology (GO) enrichment analyses of protein coding genes, proximal to the region of conservation, in both organisms highlighted similar GO classes like regulation of transcription and central nervous system development. The proximal coding genes in both the species show enrichment of their expression in brain. In summary, we show that interesting genomic regions in zebrafish could be marked based on their sequence homology to a mouse lncRNA, overlap with ESTs and proximity to genes involved in nervous system development.Conclusions: Conservation at the sequence level can identify a subset of putative lncRNA orthologs. The similar protein-coding neighborhood and transcriptional information about the conserved candidates provide support to the hypothesis that they share functional homology. The pipeline herein presented represents a proof of principle showing that a portion between 4 and 11% of lncRNAs retains region of conservation between mammals and fishes. We believe this study will result useful as a reference to analyze the conservation of lncRNAs in newly sequenced genomes and transcriptomes. \uc2\ua9 2013 Basu et al.; licensee BioMed Central Ltd

University of Birmingham Research Portal

Aberdeen University Research

Sissa Digital Library

Compendium of 4,941 rumen metagenome-assembled genomes for rumen microbiome biology and enzyme discovery

Author: A Chao
Alan W. Walker
Amanda Warr
B Buchfink
BD Ondov
BE Suzek
BL Cantarel
C Quast
C-A Duthie
C-A Duthie
CLC Ip
CT Brown
D Hyatt
D Li
DA Cowan
DD Kang
DD Roumpeka
DE Wood
DH Parks
DH Parks
DH Parks
E Pasolli
F Asnicar
F Rubino
G Henderson
H Li
H Zhang
I Letunic
J Kamke
J Kasparovska
J Koster
J Mistry
J Risse
JR Conway
L Fu
LM Solden
M Hess
M Kanehisa
M Sakamoto
M Watson
M Watson
Marc D. Auffret
MD Auffret
MD Auffret
Mick Watson
MR Olm
N Segata
NJ Loman
O Svartström
P Siguier
R Roehe
R Seshadri
R Vaser
Rainer Roehe
RD Finn
RD Stewart
RD Stewart
RD Stewart
RJ Wallace
RM Bowers
Robert D. Stewart
S Kittelmann
S Koren
S Kurtz
SA Huws
SE Hoo
T Seemann
The UniProt Consortium.
W Shi
Y Benjamini
Y Peng
Z Yu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/08/2019
Field of study

The Rowett Institute and SRUC are core funded by the Rural and Environment Science and Analytical Services Division (RESAS) of the Scottish Government. The Roslin Institute forms part of the Royal (Dick) School of Veterinary Studies, University of Edinburgh. This project was supported by the Biotechnology and Biological Sciences Research Council (BBSRC; BB/N016742/1, BB/N01720X/1), including institute strategic programme and national capability awards to The Roslin Institute (BBSRC: BB/P013759/1, BB/P013732/1, BB/J004235/1, BB/J004243/1); and by the Scottish Government as part of the 2016–2021 commission.Peer reviewedPublisher PD